A Constant-Factor Bi-Criteria Approximation Guarantee for k-means++
نویسنده
چکیده
This paper studies the k-means++ algorithm for clustering as well as the class ofD sampling algorithms to which k-means++ belongs. It is shown that for any constant factor β > 1, selecting βk cluster centers by D sampling yields a constant-factor approximation to the optimal clustering with k centers, in expectation and without conditions on the dataset. This result extends the previously known O(log k) guarantee for the case β = 1 to the constant-factor bi-criteria regime. It also improves upon an existing constant-factor bi-criteria result that holds only with constant probability.
منابع مشابه
A Bi-Criteria Approximation Algorithm for k-Means
We consider the classical k-means clustering problem in the setting bi-criteria approximation, in which an algoithm is allowed to output βk > k clusters, and must produce a clustering with cost at most α times the to the cost of the optimal set of k clusters. We argue that this approach is natural in many settings, for which the exact number of clusters is a priori unknown, or unimportant up to...
متن کاملA hybrid DEA-based K-means and invasive weed optimization for facility location problem
In this paper, instead of the classical approach to the multi-criteria location selection problem, a new approach was presented based on selecting a portfolio of locations. First, the indices affecting the selection of maintenance stations were collected. The K-means model was used for clustering the maintenance stations. The optimal number of clusters was calculated through the Silhou...
متن کاملCut Problems in Graphs with a Budget Constraint
We study budgeted variants of classical cut problems: the Multiway Cut problem, the Multicut problem, and the k-Cut problem, and provide approximation algorithms for these problems. Specifically, for the budgeted multiway cut and the k-cut problems we provide constant factor approximation algorithms. We show that the budgeted multicut problem is at least as hard to approximate as the sparsest c...
متن کاملApproximation Algorithm for the Max k-CSP Problem
We present a ck 2k approximation algorithm for the Max k-CSP problem (where c > 0.44 is an absolute constant). This result improves the previously best known algorithm by Hast, which has an approximation guarantee of Ω( k 2k log k ). Our approximation guarantee matches the upper bound of Samorodnitsky and Trevisan up to a constant factor (their result assumes the Unique Games Conjecture).
متن کاملScalable constant k-means approximation via heuristics on well-clusterable data
We present a simple heuristic clustering procedure, with running time independent of the data size, that combines random sampling with Single-Linkage (Kruskal’s algorithm), and show that with sufficient probability, it has a constant approximation guarantee with respect to the optimal k-means cost, provided an optimal solution satisfies a center-separability assumption. As the separation increa...
متن کامل